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BASIS SELECTION IN A FINITE SET 

1. INTRODUCTION 

Finding a "good basis" for a vector space is a classical question in harmonic analysis and 
frame theory. A more restrictive question is to find a good basis from a given finite set of 
vectors, a topic treated in the signal processing area under names like "method of frames" 
[1], matching pursuit" [2] and "basis pursuit" [3]. In our work we consider another variant 
in which the basis is chosen within a given finite set, and must be a "good basis" for all the 
vectors m the set. 

Our work on basis selection is motivated by a sensor selection problem for interference 
cancellation m digital communication. This problem leads to the following question: 
Given a set a 1 ,a 2 ,» ,a B of n vectors in R m , where w<«, find a subset S of size m which can 
serve as a good basis for the remaining n-m vectors. Hence, in our problem the goodness 
ot the basis is evaluated relative to the same set from which it is selected 
To make the notion of a "good basis" concrete, consider the following model of n linearly 
distorted noisy measurements of a vector u=(«„M 2 ,-.,« m ) 7 ": 
JS=(a,.,i^+3, /=i,.. , n 

where x t is the i-th measurement, a = (a n ,a l2 ,...,a lm f is the corresponding vector of linear 
distortion coefficients, Zf is the corresponding noise, and (..) denotes inner product. 
WewishtofindasubsetSc{l,...,4ofsi2e|5| = WI such that the measurements {x k ,k eS } 
are "good sensors" for the remaining measurements fo, i « s) . 

We define subset goodness in two ways: 

(i) Low noise amplification; 

(ii) Small residual entropy. 

The former notion leads to a criterion of small expansion coefficients of a< in terms of 
K - * 6 S) i-e- the expansion coefficients vector having a small i m norm, while the latter 
notion leads to a criterion of maximum determinant of the mxm matrix A s composed of 
the vectors fa k ,kes}. 

We arrive at these criteria from probabilistic arguments. Assume that «„«„... „ are i i d 
random variables ~ N(0,1) mutually independent of z v z 2 ,...,z n which are i.i.d. ~Ar ( o,cr 2 ). 
The minimum Mean Squared Error (MSE) estimate of x, from {x k ,k s s} , which is given in 
general by the conditional expectation Xl = E{x t \x k ,kes}, takes in this case a linear form: 

Xi=(Si,X S ), (2) 
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where tt~(sa.g a .-.g t J is a vector of linear estimation coefficients, and x s is the vector 
with coefficients fe. k e s) . Furthermore, as „ * -> o the optimal gf approaches the Least 
Squares (LS) solution, i.e., the expansion of a, in terms of the vectors {a*,* 
a * = Z*a «*=^-g/ - (3) 

Combining (1), (2) and (3), it follows that the estimation error of the LS solution is 

=(g„zs)-z/, (4) 
where z s is the vector {z k ,kes}, hence the resulting MSE is 1 

EL*t-*t?°>o*W+D. (5) 
We see that the noise is amplified by the expansion coefficients of a, relative to the basis S. 
We say that a basis 5 is an a-amplifier if the expansion coefficients of all vectors outside 5 
are absolutely bounded by a, i.e., for i «s s 

\gij\£a for all j. (6) 

Finally we say that a basis is good in the sense of noise amplification if it is a 1 -amplifier 
i.e., it I g itJ |^ i for all/ ' 

We turn to motivate the second criterion of basis goodness. The residual entropy of the 
measurements relative to a basis S is defined as the conditional differential entropy of the 
measurements outside 5 given the measurements in S, 

h(Xi,i*S\x k ,keS) (7) 

(see [4] for the definition of h(..)). 

This quantity determines the Shannon capacity of an n-lines vector channel, with additive 
noises *„...,*„ , assuming lines k e s act as "sensors" (provide channel side information) for 
the rest of the lines. The smaller the residual entropy is, the higher is the capacity of lines 

Now, by the chain rule for joint entropy we have, [4], 
, . . . x„ ) = k(x k , k e S) 

+ h(Xi,i£S\x k ,keS) ^ 

Furthermore, since *,,...,*„ are joindy Gaussian, 

e 2A(Xi.*eS) _j det ^ +<7 2 ) | . (9) 

Thus, minimizing the residual entropy over the choice of S amounts to maximizing 
| det(A s +o- 2 ) | . As a 2 -> o this becomes 

t 

S*= argmax |det(Aj)|. (10) 

{fc{l,..,n},|S(=»i} 

As we shall see in the sequel, the two notions of goodness (unit noise amplification and 
maximum basis determinant) are closely related via Cramer's law. Every locally optimal 
solution for (10) (i.e., a subset such that replacing one vector does not increase its 

l £SSSStS£ MSE of * e LS solution for °* (not on,y sma,1) * 311(1 k is ^ M W bound on 



determinant) is a 1 -amplifier basis. However, not every 1 -amplifier basis achieves the 
global maximum in (10). 

Geometrically, the determinant of A s amounts to the product of the lengths of {a k ,kzS} 
and the sines of the angles between each vector and the linear subspace spanned by the 
previous vectors (in some order). Hence, large determinant corresponds to long and close 
to orthogonal vectors. This partially resembles a search for the shortest basis of a given 
lattice. 

For a lattice the basis determinant is fixed (it is the volume of the lattice basic cell), so 
minimizing the vectors' lengths is equivalent to making the angles as close to 90° as 
possible. See the LLL algorithm, [5], for an efficient search for a reduced basis of a lattice. 

Solving (10) requires, in principle, searching all subsets and calculating their 

determinants. This implies ~ n m determinant calculations. On the other hand, a possible 
greedy solution (similar to matching pursuit [2]) sequentially selects the residual longest 
vector in a Gram-Schmidt-like process, implying linear complexity in n. However, this 
solution only guarantees a ^-amplifier basis, and a far from optimum basis determinant. 

2. Typical Results 

In this work we investigate the gap in performance between the optimum solution and low 
complexity variations on a greedy solution. We consider both the noise amplification and 
the maximum determinant basis selection criteria. 
Some typical results of this work are: 

1 • Worst noise amplification: Applying the greedy algorithm on a given set of vectors 
results in a basis that is at most a 2 m ~ l -amplifier, and this bound may be achieved. 

2 - Efficient search for low noise amplification basis : For every set of vectors sl x , a 2 , • • • , a„ , 
a basis which is a m l/m -amplifier (or better) can be found with a complexity of 0(n* w 4 ) . 

3 - Worst case determinant gap: If the absolute determinant of every concatenation of a 
vector of A s with anm-1 subset of vectors is less than or equal to the absolute 
determinant of A s , then the maximal determinant M is bounded by 

M I &zt{A s )\ , and there exist examples for which the bound is achieved. 

4. For every m>k>l 9 we can construct examples of a set of vectors, where replacing any 
subset of k vectors from A s by any k vectors does not increase the determinant of A s , yet 
the maximal determinant M satisfies M £ ^m/k m | det(A s ) | . 

The first two results are "optimistic" in the sense of finding low noise amplification basis 
with low complexity, while the latter results are "pessimistic" about the complexity of 
finding a basis with maximal determinant. 

3. References 

[1] I. Daubechies, Time-frequency localization operators: a geometric phase space 
approach, IEEE Trans. On Information Theory, 34(1988), pp. 605-612. 
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SELF BASIS SELECTION IN A FINITE SET II 
I. INTRODUCTION 

Finding a "good basis" for a vector space is a classical question in harmonic analysis and 
frame theory [1]. A more restrictive question is to find a good basis from a given finite set 
of vectors, a topic treated in the signal processing area under names like "matching pursuit" 
[2] and "basis pursuit" [3]. 

Our work on basis selection is motivated by a sensor selection problem for interference 
cancellation in digital communication. This problem leads to the following question: 
Given a set a,,a 2 , -.a,, of n vectors in R m , where m<n, find a subset S of size m which can 
serve as a good basis for the remaining n-m vectors. Hence, in our problem the goodness 
of the basis is evaluated relative to the same set from which it is selected 
To make the notion of a "good basis" concrete, consider the following model of n linearly 
distorted noisy measurements of a vector u = («, , u 2 , • • • , u m f : 

x i =(a,.,u) + z |9 / = w ^ 

where x t is the i-th measurement, a = {a iV a i2 ,...,a im f is the coiresponding vector of linear 
distortion coefficients, z { is the corresponding noise, and <..> denotes inner product. 
We wish to find a subset S <z {l,...,«} of size |s| = m such that the measurements {x k , k<zs} 
are "good sensors" for the remaining measurements {x i9 

We define subset goodness in two ways: 

(i) Low noise amplification; 

(ii) Small residual entropy. 

The former notion leads to a criterion of small expansion coefficients of a f in terms of 
{a*, k e 5}, while the latter notion leads to a criterion of maximum determinant of the 
mxm matrix A s composed of the basis vectors {a*, k e s) . 

We arrive at these criteria from probabilistic arguments. Assume that u v u 2 r",u m are i.i.d. 
random variables ~N(OJ) mutually independent of zi,z 2 ,—,z„ which are i.i.d. ~W(0,<r 2 ). 
The minimum Mean Squared Error (MSE) estimate of ^from {x^kes}, which is given 

in general by the conditional expectation = e{x { | x k9 k e s} , takes in this case a linear 
form: 

9 

*/=(g,,x 5 ), (2) 
5 
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where g, = (*«,*«.— .aJF" is a vector of linear estimation coefficients, and x s is the vector 
with coefficients {x k ,k<ss}. Furthermore, as a 2 o the optimal gi approaches the Least 
Squares (LS) solution, i.e., the expansion of a f in terms of the vectors {a A> * <= s} : 

a i = I«a- a t=^-& • (3) 
Combining (1), (2) and (3), it follows that the estimation error of the LS solution is 

x i-Xi={Si,Z S )-Zt, (4) 

where x s is the vector {z k ,kes}, hence the resulting MSE is 2 

Elxi-x,? = <r*fatf+i). (5) 

We see that the noise is amplified by the expansion coefficients of a, relative to the basis 
S. 

We say that a basis S is an a-amplifier if the expansion coefficients of all vectors outside S 
are absolutely bounded by a, i.e., for t «s s 

\gij\£a forallj. (6) 

Finally, we say that a basis is good in the sense of noise amplification if it is a 1-amplifier 
^e-, if I 8ij |s l for all j. v 

We turn to motivate the second criterion of basis goodness. The residual entropy of the 
measurements relative to a basis S is defined as the conditional differential entropy of the 
measurements outside S given the measurements in S, 

- d- 

r f 

h(x it if£S\x k ,ksS) (7) 

(see [4] for the definition of h(..)). 

This quantity determines the Shannon capacity of an n-lines vector channel, with additive 
noises x lt ..., x „ , assuming lines keS act as "sensors" (provide channel side information) for 
the rest of the lines. The smaller the residual entropy is, the higher is the capacity of lines 

i g S . 

Now, by the chain rule for joint entropy we have, [4], 

Kx l ,...x n ) = h(x k9 keS) + h(x i ,ieS\x k ,keS) (8) 

Furthermore, since ^ ^ are jointly Gaussian, 



the BayS^SE 6 ° f LS S ° IUti ° n ** ^ ^ ° nly ""^ "* fc iS dwayS an upper bound on 



i 



e^^^deKAs+cx 2 )]. (9) 

Thus, minimizing the residual entropy over the choice of 5 amounts to maximizing 
| det(A s +0- 2 ) | . As cr 2 -> o this becomes 

S*= argmax |det(A 5 )| # (10) 

As we shall see in the sequel, the two notions of goodness (unit noise amplification and 
maximum basis determinant) are closely related via Cramer's law. Every locally optimal 
solution for (10) (i.e., a subset such that replacing one vector does not increase its 
determinant) is a 1-amplifier basis. However, not every 1-amplifier basis achieves the 
global maximum in (10). 

Geometrically, the determinant of A s amounts to the product of the lengths of {a k ,k(=S} 
and the sines of the angles between each vector and the linear subspace spanned by the 
previous vectors (in some order). Hence, large determinant corresponds to long and close 
to orthogonal vectors. This partially resembles a search for the shortest basis of a given 
lattice. fc 

For a lattice the basis determinant is fixed (it is the volume of the lattice basic cell), so 
minimizing the vectors* lengths is equivalent to making the angles as close to 90° as 
possible. See the LLL algorithm, [5], for an efficient search for the shortest basis of a 
lattice. 

Solving (10) requires, in principle, searching all M subsets and calculating their 

determinants. This implies - n m determinant calculations. On the other hand, a possible 
greedy solution (as in matching pursuit T21) sequentially selects the residual longest vector 
in a Gram-Schmidt-like process, implying linear complexity in n. However, this solution 
only guarantees a 2 m ' - amplifier basis (see Section 3), and a far from optimum basis 
determinant. 

In this work we investigate the gap in performance between the optimum solution and low 
complexity variations on the greedy solution above. We consider both the noise 
amplification and the maximum determinant basis selection criteria. The next section 
describes the basis selection algorithms, while Sections 3 and 4 present our results relative 
to the two goodness criteria. 



2. Basis Selection Algorithms 

We shall consider two algorithms for Basis Selection. 

7 



1. Residual Longest Vector Selection. 

2. One by One Replacement Algorithm. 

Eventually we shall combine the 2 algorithms to get a low complexity algorithm for 
choosing a basis, which is close to a 1 -amplifier basis. 

1. Residual Longest Vector Selection. 
This algorithm proceeds as follows: 

Given a set a p a 2 ,---,a IJ of n vectors in define an m stage algorithm for choosing 
a set 5 c {l,...,w} of size \s\ = m , such that will serve as a good basis for the 

given set. 

Define / to be the identity mxm matrix, Qq = 0 , So to be the empty set. 
At each stage i do the following: 

a. Choose k t = arg max(|(/ - &-1G/-1 )a* |) . (i.e. choose the vector with maximal 

difference from its projection on the span of its predecessors). 

b. Define Si to be the (ordered) set of indices already chosen, i.e. S ( = U** , 

and define A S[ to be the mxi matrix composed of the (ordered) basis vectors 

c. Compute a QR decomposition of Ag t , i.e. compute A s = Q i • R. , where Q t is 

a matrix of order mxi with orthonormal columns, and R t is upper triangular of 
order ixL 

d. If i<m, return to step a., else if z=m , define s = 5 m , A s = A S , and stop. 

The princi ple of this algorithm is similar to that of Matching Pursuit [2L a^d it is a 
natural fir st choice basis . It provides some bounds on the noise amplification and on 
the maximal determinant, but we shall see that these bounds are quite high. 

2. One by One Replacement Algorithm. 

This algorithm is a simple replacement algorithm. 

Given a first choice for an m subset S and the corresponding matrix A s , the algorithm 
proceeds as follows: 

a. Define a threshold a>\ 9 and a maximal number of iterations /. 

b. Compute the expansion coefficients of all the vectors a x , a 2 , • • • , a rt , relative 

to the given basis, (i.e. for each a, compute g, = A^ 1 -a f ). 

c. Find G = maxfl g Uj |) and the coordinates ij for which the maximum is 

achieved. 

d. Compare G with a. If it is greater than a then replace the y-th vector of 
A s by a, . Update the set S and the matrix A s . If the maximal number of 
iterations has not been reached, repeat the algorithm from step b. 

8 
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e. If G is not greater than a, then the algorithm stops and the current basis is an 
a-amplifier or better. 
If it is known that the maximal determinant M is bounded by M £ £>• | det(A s ) | for the 
original S, then choosing a = d 1/</+1 > guarantees that the algorithm will succeed to 
find a basis which is an a-amplifier or better in no more than / iterations. 
Moreover we shall see that with a complexity of 0{n • m 4 ) the combination of the 2 
algorithms can find a basis, which is quite close to a 1 -amplifier. (Note that the complexity 
of the full search is 0(\fy which can be considerably higher than 0{n • m 4 )for n»m»l). 

3. Noise Amplification Results 

Applying the Residual Longest Vector Selection algorithm on a given set of vectors 
aj,...^,, results in a basis, which is bounded by a 2 m ~ 4 -amplifier. More precisely, for each a, 

and for 7=1, ...,ra, the expansion coefficients g/j (equation (3)) satisfy 1 8ij . 

Furthermore, | det(A 5 ) |^ M !<{m m , where M denotes the maximal absolute value of all the 
determinants associated with a subset of m vectors of a 1 ,...,a n . 

These results may be quite tight and there exists examples of a vector set for which the 
algorithm indeed results in a basis which is quite close to a 2 m ~ l -amplifier, and there exist 

examples for which the maximal determinant is indeed M = <Jm m \ det(A 5 ) | . 

Combining the 2 algorithms we can prove that for each /, performing at most / single 

replacements can reduce the basis to a basis that is a ^/m -amplifier or better. 
Substituting for example I =m 2 /2 we get a m Vm -amplifier. 

The complexity of the suggested algorithm is 0(n • m 4 ) and it results in a basis, which is 
quite close to a 1 -amplifier. (Note that the complexity of the full search is 0((^j) which can 

be considerably higher than 0(n • m 4 ) for n»m»l). 



4. Residual Entropy Results 

Residual Entropy is associated with finding a set with maximal determinant. 

The results we have in this case are more "pessimistic", i.e. our results suggest that finding 

the maximal determinant may be a problem of high complexity. 

As mentioned above, applying the Residual Longest Vector Selection algorithm on a given 

set of vectors a],...,a„ results in a basis satisfying, | det04 5 ) |> M /<Jm m . 

We can construct examples where this bound is tight. In particular for each <£>0, there exist 
examples for which the Residual Longest Vector Selection results in a 1-amplifier basis 
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(i.e. any replacement of only 1 vector will not increase the determinant of As), however the 
maximal determinant M satisfies M £ Jml(\+e) m \ det(A 5 ) | . 

Moreover, for every m>k>l, we can construct examples of a set of vectors where in 
addition to the above, replacing any subset of k vectors from A s by any k vectors does not 
increase the determinant of As, yet the maximal determinant M satisfies 



M^4mik m |det(A s )|. 

For k=m~l we have a tighter bound. If replacing any m-1 vectors of As by any m-1 vectors 
will not increase the determinant of As, then the maximal determinant M is bounded by 

M £ Jm mKm * l) I det( A s ) | , and there exist examples for which the bound is achieved. 

These results suggest that finding the maximal determinant, or even a subset associated 
with a determinant which is close to the maximal determinant, may be a problem with high 
complexity, and may require an exhaustive search. 
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Choosing a Good Base for ACTC. 
Algebraic Part 



INTRODUCTION: 

In this paper we will introduce a concept of a good base and provide a low complexity 
algorithm for computing a good base from a given set of vectors. 

The problem presented here and the algorithms for computing the good base are directly 
connected to the problem of choosing a good sensor set for ACTC. 
In this paper though, we will deal only with the mathematical part. 

GOOD BASE: 

PseudoDefmition: Given a finite set of vectors V, a good base for span(V) is a subset 
ffc7 such that H is a base for span(V) and every vector v e V can be expressed as a linear 
combination of elements of H with all the coefficients "small'' (in absolute value). 

Definition: Given a finite set of vectors V 9 and a positive scalar a, a a good base for 
span(V) is a subset flcV such that H is a base for span(V) and every vector v e V can be 
expressed as a linear combination of elements of H with all the absolute values of the 
coefficients <a. 

Lemma 1: For every finite set (or even closed and bounded infinite set) V cJR m , there 
exists a a good base with a=l. 

Proof: W.l.g. we may assume that span(V) = R m . For every subset Hoim vectors of V, 
compute the determinant, and choose a subset H with the greatest determinant. According 
to the famous Cramer rule when expressing any other vector veVasa linear combination 
of the elements of H, the coefficients are ratios of determinants where the numerator is a 
determinant of a matrix generated from H by changing one of its columns and the 
denominator is the determinant of H. 
The result follows from the maximality of det(H). 

Lemma 1 tells us that there exists a "very good" base, but the complexity for computing 



that good base is very large. One has to go over all 
their determinants before finding the good base. 



subsets of size m and compute 



We will now proceed to show that a nafte algorithm of choosing a "good" base for V 9 may 
result in an a good base with a « 2 m . 

We will suggest an alternative way for finding a a good base, with a <m, but with bounded 
complexity. We will also show the dependency of a on the complexity. 



11 



r»~«.w _~~h,Sj4 a *j u., IIQPTO from th<* IFW Imnnp Datahaco nn m/H/POOS 



The first step in the suggested algorithm is to perform a generalized QR decomposition or 
generalized Gram Schmidt orthogonalization process, according to the following: 

1 . Find the largest vector in V. Choose it as the first vector in the "good" base. 

2. Normalize the vector and choose the normalized vector as the first column of Q. 

3. After computing i vectors, compute the next vector as the vector whose difference 
from its projection on the span of the previous columns of Q is the largest. Choose 
that vector as the i+i-th vector. 

4. Normalize the difference of the i+2-th vector from the span of the previous vectors, 
and choose the normalized difference vector as the z+i-th vector of Q. 

5. Increment i by 1 and return to step 3, (unless i is greater than predefined value). 

After completing this process we have a base Q 9 an upper triangular matrix /?, with the 
following properties: 

1 . H = QR, where H is a matrix whose columns are the vectors chosen for the base. 

2. For every v e V , define y to be the vector such that v= Qy, then for the columns of 
H 9 we associate the columns of R, while for every veFwe see that its associated 
vector y has the following property : 

2* w y/ < r w a . (where r Ui is the i-th diagonal element of K). 

In particular for every v e V , y, 2 < r w 2 

In the sequel we shall assume that the original vector set V was given in that format, i.e. the 
first m vectors of V are the columns of R, and in addition every veV satisfies property 2 
above. 

Notations: 

1. R upper diagonal matrix. 

2. r i4 , the i-th diagonal element of R. 

* 

3. A/, for a matrix A and a vector v, is the matrix received by replacing the z-th 
column of A by the vector v. 

1 

w— 1 

Lemma 2. The columns of R are a a good base for the set V with a = 2 

Proof: By the famous Cramer equation, the coefficients involved in expressing the vector 
v as a linear combination of the columns of R are ratios of determinants: 

Since R = YlZ-i r U > ^ P ro °f wil1 follow immediately from the following: 
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Lemma 3: R? <Z 2 m ~ i YlZ ll ■ 

Proof: Recall the basic definition of the determinant as a sum of products associated with 
permutations. 

det(#)=2: ff signivjfjh^. 

In our situation if the i-th column of R is replaced by a vector v we get the following form 
of matrix: 



0 
0 



0 



0 
0 



* 



0 

': 0 



* A 
* 



0 



* 



0 



0 0 



w,m J 



If we choose cr( j) for the first i-i columns there is only one choice that can be made, 
namely <j(j) = j . For all other choices of <r(jf) , the result will be 0. 
If we now skip the z-th column and pick cr(j) for the columns from i+i to m, we can 
choose cr(jf) for each column to be either cr(j) = j or we can choose <r( j) to equal the 
number from i+1 to j that was not yet chosen, thus for each column we have 2 valid choices 
of cr(jr) . 

Altogether, we have 2 W " Z valid choices of a . 
The z-th column is then uniquely determined. 

For each choice of a we associate with it a product sign(<r) J"J . , where each element 
in the product satisfies: j^o-(i), j | ~ \ r i,i | > 

thus the total product rign(tr)[J . satisfies sign(o-)Yl h a (iyj - | det (^)| • 
The result follows from the fact that there are only 2 m ~ l valid permutations. 
DETERMINANT BOUNDS: 

In this section we will give some absolute bounds and some relative bounds on the 
determinants of mxm matrices whose columns are subsets of the set V satisfying all the 
properties discussed above. Bear in mind that we have a reference matrix, namely the 

matrix R above whose determinant equals: det(/?>= Ylj =l rjj 
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1 . Relative bounds (depending on the values of the r t j ). 

a. For each mxm matrix M, whose columns are vectors from the subset V, we 
have: 

det(M)<r hl m 

Proof: Each column of M is a vector, whose norm is <rn, therefore the result 
follows. 

2. Absolute Bounds (depending only on the dimension m, and the basic determinant 
det(7?)). 

a. For each mxm matrix M, which was received from R by replacing k<m columns 
with columns from the set V, we have: 

det(M) < (k + l) m -*fc!det(/?) . 

For replacing all the m columns we get: 

det(M)</w!det(/?) 

Proof: The worst case is easily seen to be when replacing the first k columns of R. In 
that case the number of possible choices for cr( j) , for the last columns beginning 

with the lc+1-th column is k+1. For the m-k columns we get (k + Y) m ~ k choices. For 
the remaining first k columns we are left with k! choices and the result follows. 

b. For each mxm matrix M, which was received from R by replacing k<m columns 
with columns from the set V, we have: 

det(Af ) <> 4m k m\lk\ det(#) 
For replacing all the m columns we get: 

det(M)<^m™ det(#) 
Proof: Compute the determinant in rows. In each row i of the matrix, each element 
is no larger than . 

The worst case happens when replacing the first k columns of R. 

In this case, the first k+1 rows of the new matrix can contain m non-zero elements, 

while from then on, the number of non-zero elements decreases by 1 for every row. 

Therefore the norm of each row j of the first fc+i rows is at most Vwj^*, j \ • 

From the k+2 row down to the last the multiplic ative f actor decreases for each row 
until we get that the bound for the last row is: ^Jk + l\r m m . 



When computing a bound on the determinant, one must always take the minimal of the 
above bounds, thus for a small number of replacements one must choose between the 
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bound ri 9 i m and the bound (k + Y) m k fe!det(7?) , while for a large number of replacements 
the choice is between r^" 1 and ^ m k m\l kl det(£) . 

Theorem: Suppose B is a bound on the determinant of a replacement matrix with £+7 
replacements, die following algorithm will generate an a good base with 

a*(*/det(loy** +1) 
while performing at most k column replacements of the matrix R. 

Algorithm: 

1. Denote Mo— R* 

2. At each stage i denote the replacement matrix of that stage as M*. Compute all the 
determinants of matrices, which are replacements of 1 column of M,- and check then- 
absolute value. 

3. If the maximal absolute value of a determinant thus obtained denote MaxMi is such that 
\MaxM t I det(M;)| < (B /det(IZ)J l/ * +1) > then choose M% as the base, else, 

4. Define Mm to be the matrix for which the maximum of the determinant was obtained 
and go to step 2. 

Proof: It is obvious that if at a stage i<k, condition 3 was fulfilled then, Mi is indeed a a 
good base with a < (B I det(/?)) L/(A:+I) . 

If not, then at stage k we get: |det(M* j| £ det(M 0 ) • {B I det(/?))* . 

If there exists a vector v whose representation in terms of the base defined by M* includes 

coefficients greater than (#/det( J R)) l/( * +1) , this means that the determinant of M k v is 

greater than det(M k ) • (B/ det(tf > det(M 0 ) • (B I det(/?)) = B . 

But B is the bound on a replacement matrix with k+1 replacements. Thus we cannot get an 
inequality in the algorithm for k successive iterations. 

REMARK: In the above algorithm, one does not have to compute each determinant 
individually, but rather to solve the linear equation system for each of the vectors. The 
complexity for this process is relatively small, (linear in n (the number of vectors), cubic in 
m). 

EXAMPLES: 

1 . For no replacements we get as large as a = 2 m . 

2. For one replacement we get a < 

3. For m replacements the bound gets dramatically reduced to a <, ijm . 
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1//W 

4. For m 2 /2 replacements the bound gets further reduced to a < m -» 1 • 
EXTENSION: 

In the following section we shall generalize the problem of approximating a finite set of 
vectors in an m dimensional vector space by a set of k<m vectors. 
For that matter we need to generalize our definitions: 

Definition: Given a finite set of vectors V, and positive scalars a,k an a X good 
approximating set for V is a subset HcV such that every vector v e V can be 

k 

approximated as a linear combination of elements of H ]£cA- such that: 

i=l 



The motivation for such a definition is that it leaves room for balancing between two 
different needs: 

1. Generate a close approximation of every vector v. The first term in the inequality 
takes care of that. 

2. Avoid noise enhancement. That is the job of the second term. 



In Donoho et al there is a slightly different approach. They look at the quantity: 

* f 

v - ^ CfhA + X sum(\ci |) ► < a 



0.5 



Mi U2 



Ignoring the first term (in this case X has no meaning), we can extend the contents of the 
first part of the paper to the case where the number of vectors in the dictionary we choose is 
smaller than the dimension of span(V). 

Theorem (Extension of Lemmal): For every finite set (or even closed and bounded infinite 
set) V c R m , and every k&n there exists a subset H containing k vectors such that H is a 
good base with a=l, for the set Ph(V), where Ph(V) is the (orthogonal) projection of V into 
span(col(H)). 

Proof: Choose H such that det(H) is maximal, where detfH; for a rectangular nxm matrix 

with n>m is defined by det 2 (H) = det(H T H) . 
For every veV solve the equation: Hx = P(v) . 
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If Q is the famous orthonormal matrix we get: Hx = P(y) = QQ T v . 

Multiplying by Q T we get: Q T Hx = <2 T v . 
According to Cramer, each coordinate of x is given by: 

A generalization of Cauchy-Szhwartz would help us see that for every 2 nxm matrices A,B 7 
(n>m), 

'<|det(A)det(B)|. 

For non zero determinants, equality is achieved if and only if 
span( col(A )) = span( col( B)). 

Applying to our case we have: det(Q T H?} < det(H?) , and det(Q T H) = det(H) , therefore 
the result follows from the maximality of d&t{H). 

This proof shows that we can handle the noise enhancement problem even with a small 
"dictionary". 

The algorithm which was defined above for k=m, may be used in this case also, while the 
determinant bounds have to be recalculated. 

ESTIMATION ERROR: 

Suppose H = {/&i,/*2> # **> h m-l > Ki } is a dictionary for the m dimensional set V. 
Suppose now that we omit one vector from the dictionary, say h m . 

m 

Consider a vector v which was originally expressed as: v = 2j a foi > 

/=1 

After the omission of h tm only the projection on the subspace generated by the first m-1 
vectors will now be expressed as: 

m— 1 

P(y) = £ afc + a m P(h m ) 

In this case, an estimation error a^l^ - P(h m )) is generated. 

The estimation error is clearly smaller when h m is closer to its projection, or in other words 
when there is a sharp angle between h m and the subspace generated by the first w-i vectors. 

ESTIMATION ERROR vs. NOISE ENHANCEMENT: 
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m—1 

If in the above example we express P(h m ) as P(h m ) = 7, fijhj , then: 

1=1 

P(y) = + a mP(h m ) = ]T (a* + a^)^- 

i=l £=1 

In order to ensure low noise enhancement we would like to have the coefficients fii as 
small as possible. One way of keeping the coefficients low is by having a big projection 
error of h m . 

But this is contradictory, in some sense, to the prior requirement of a small estimation error, 
which will be achieved with a small projection error. 

The way of finding equilibrium between the 2 is still for further study. 
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Choosing a Vector Set with Maximal Determinant 
Consider the following questions: 

Given a set V=fv l9 v 2 , v 3 , v n ,Jc:R ,n , and assume that span(V)- R m : 

Choose a subset W czV , such that W = argmax (Jdet(t/)|) . 

1/cV, \U\=m 

An exhaustive search will solve the problem for a finite V, and the complexity of such a 
solution will be 0((£)m 3 ) , since there are ) subsets and the complexity of each 
determinant computation is 0(m 3 ). 

A simple low complexity solution to find a set with a determinant "close" to the maximum 
is a generalization of the Gram-Schmidt algorithm, which proceeds as follows: 

1, Find the largest vector in V. 

2. Loop: 

After choosing i vectors, the i+1 vector will be chosen to be the vector whose 
difference from its orthogonal projection on the span of the first i vectors is 
maximal. 

The complexity of this algorithm is considerably lower than the exhaustive search. 

It consists of at most m stages. 

Each stage a search is done on at most n vectors. 

For each search a projection computation is done which takes less than m 2 operations. 
Altogether the complexity is reduced to ~nm 3 . 

However this solution does not necessarily achieve the maximal determinant. 
Some bounds on the performance of this algorithm were introduced in [1]. 

In this paper we analyze the following refinement of the above algorithm: 

1. Choose a maximal number of iterations and denote it by max_J. 

2. Choose a number of vectors, k, to be replaced in each iteration. 

3. In each iteration up to max J, search over all subsets of size k of V that are not in the 
currently chosen set, and compute if replacing k vectors of the current set by the 
new vectors will increase the determinant. 

4. If there is no set of size k 9 which can increase the determinant by replacing k 
vectors of the original matrix, then terminate the algorithm. 

Without getting into details on the complexity of such an algorithm the following result 
shows that there may be a considerable difference between the maximal determinant and 
the determinant obtained by the algorithm: 

Lemma 1: For any integers kj with k<l, there exists an integer m >Z, and a set of 2m 
vectors in R m such that: 
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1. The "Gram-Schmidt" algorithm will choose the first m vectors. 

2. Replacing any k vectors will not increase the determinant. 

3. The maximal determinant is the determinant of the last m vectors and it is greater 
than the chosen determinant by a factor of (m/hf* 2 . 

In particular, there exist examples of large m-s where changing m~l vectors will not 
increase the determinant while replacing all m vectors will increase the determinant by 
~sqrt(e). 

Proof: 

Given k,l, choose m >l to be an integer for which a Hadamard matrix exists. 

Let H/» be an m dimensional orthonormal Hadamard matrix, i.e. H = ~H, where the 
elements of H are all from the set f±lj. 
Let A = ^ag(l,2-°- 5 ,2*- 1 ,2- 1 - 5 ,.- % 2- (m ~ 1)/2 ). 

The set of 2m vectors, will be the columns of the matrices A and B = ^mlkAH 

The norm of any column of B is: ^l/k(l^2~ l +2 -2 +-..2" <m " I) ) < . 

The norm of the last m-i rows of every column of B is less than ^2p7k 9 therefore non of 

the columns of B would have been chosen by the Gram-Schmidt algorithm (at least for 
k>2\ 

Replacing any k columns of the matrix A by columns of B would not increase the 
determinant, since the norm of any column in a k dimensional minor of <Jm7kH is at most 
1 9 however, dQt(<jm/kAH) = (m/k) mn det(A) . 

For k=w/2, we get det^JmTkAH) == 2 m/2 det(A) , so even if replacing half of the columns 

of A does not increase the determinant, there can still be a substantial difference if all the 
columns are replaced. 

For fc=/n-i, this will lead to det(^m/kAH) = (jnl m — V) mf2 det(A) « *Je det(A) . 
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CLAIMS 

We claim: 

1. A method substantially as described hereinabove, 

2. A method substantially as illustrated in any of the drawings. 

3. Apparatus substantially as described hereinabove, 

4. Apparatus substantially as illustrated in any of the drawings. 

5. A system substantially as described hereinabove. 

6. A system substantially as illustrated in any of the drawings. 
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